Insert documents into a MongoDB Collection using Python and PyMongo

Overview:

  • Inserting a piece of data or loads of data into a database server is a common operation. For example, when a user signs up for a web service in Internet, the details of the user are stored into some database server.
  • In MongoDB, a JSON document corresponds to a row of data or a data record.
  • These records are stored in binary JSON format into the MongoDB collections. MongoDB collections are like what tables are to RDBMSes.
  • Since JSON is a schema + data format, technically it is possible to insert documents with different schema into a same MongoDB collection.
  • However, it is always better to place the documents with different structures in designated collections.

 

Inserting documents into a MongoDB collection:

  • Import the pymongo module.
  • Create an instance of pymongo.MongoClient.
  • MongoClient takes the port number and the host of the MongoDB server as parameters.
  • From the MongoClient instance obtain a database instance.
  • The collections of a MongoDB database, are available as attributes of the database instance.
  • Using the object.attribute notation access the collection and call insert() method on  the collection instance.
  • insert() takes a JSON document as a string and inserts into the specified MongoDB collection.

 

Example:

# import MongoClient from pymongo

from pymongo import MongoClient

 

# Get a MongoClient object

connectionObject    = MongoClient('mongodb://localhost:27017/')

 

#Access the database using object.attribute notation

databaseObject      = connectionObject.sample

 

#Access the mongodb collection using object.attribute notation

collectionObject    = databaseObject.test

 

# insert a simple json document into the test collection

collectionObject.insert({"red":123, "green":223, "blue":23})

collectionObject.insert({"red":146, "green":46, "blue":246})

 

# Using find() query all the documents from the collection

for document in collectionObject.find():

    # print each document

    print(document)

 

 

Output:

{'_id': ObjectId('5ab9fb8302334a031cc2fc13'), 'red': 123, 'green': 223, 'blue': 23}

{'_id': ObjectId('5ab9fb8302334a031cc2fc14'), 'red': 146, 'green': 46, 'blue': 246}

 

 

Inserting multiple documents into MongoDB concurrently using MongoClient's socket pool:

  • MongoClient is an abstraction of "1 to n" number of database connections to a MongoDB server from a process.
  • A MongoClient has a connection pool with a default size of 100.
  • Within a process one or more threads can make use of the pool of socket connections to the database utilizing up to maxpoolsize connections in accordance with the parameters waitQueueMultiple and waitQueueTimeoutMS parameters.
  • When the maxpoolsize has reached and no more requests can wait in queue MongoClient starts raising exceptions.
  • The Python example below tries to insert 400 documents concurrently using 400 threads using the MongoClient and its connection pool.

Example:

from pymongo import MongoClient

from threading import Thread

 

THREAD_COUNT = 400

 

# Derive from Threading.thread to create a specialised insert thread

class DataInsertThread(Thread):

 

    database        = None

    threadNumber    = None

   

    def __init__(self, database_in, threadNumber):

        self.database       = database_in

        self.threadNumber   = threadNumber

        Thread.__init__(self)

       

    def run(self):

        self.database.test.insert({"Data inserted by thread":self.threadNumber})

 

# Get a MongoClient instance

mongoClient     = MongoClient("mongodb://localhost:27017/",

                              maxPoolSize=200,          # connection pool size is 200

                              waitQueueTimeoutMS=200,   # how long a thread can wait for a connection

                              waitQueueMultiple=500     # when the pool is fully used 200 threads can wait

                             )

# Get the database object                            

databaseObject  = mongoClient.sample

 

insertThreads   = []

 

# Create insert threads

for threadNum in range(THREAD_COUNT):

    insertThread   = DataInsertThread(databaseObject, threadNum)

    insertThreads.append(insertThread)

 

    # Start the insert thread

    insertThread.start()

 

# Wait till all the insert threads are complete

for insertThread in insertThreads:

    insertThread.join()

 

 

Output:

To see the record count after executing the above example use the MongoDB shell and issue the command count().

> db.test.count()

400


Copyright 2024 © pythontic.com